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ORIGINAL ARTICLE 

The impact of source contribution uncertainty on the effects 
of source- specific PM 2 .5 on hospital admissions: A case 
study in Boston, MA 

Marianthi-Anna Kioumourtzoglou 1 , Brent A. Coull 2 , Francesca Dominici 2 , Petros Koutrakis 1 , Joel Schwartz 1 and Helen Sun 3 

Epidemiologic studies of particulate sources and adverse health do not account for the uncertainty in the source contribution 
estimates. Our goal was to assess the impact of uncertainty on the effect estimates of particulate sources on emergency 
cardiovascular (CVD) admissions. We examined the effects of PM 2 . 5 sources, identified by positive matrix factorization (PMF) 
and absolute principle component analysis (APCA), on emergency CVD hospital admissions among Medicare enrollees in Boston, 
MA, during 2003-201 0, given stronger associations for this period. We propagated uncertainty in source contributions using a block 
bootstrap procedure. We further estimated average across-methods source-specific effect estimates using bootstrap samples. We 
estimated contributions for regional, mobile, crustal, residual oil combustion, road dust, and sea salt sources. Accounting for 
uncertainty, same-day exposures to regional pollution were associated with an across-methods average effect of 2.00% 
(0.18, 3.78%) increase in the rate of CVD admissions. Weekly residual oil exposures resulted in an average 2.12% (0.19, 4.22%) 
increase. Same-day and 2-day exposures to mobile-related PM 2 . 5 were also associated with increased admissions. Confidence 
intervals when accounting for the uncertainty were wider than otherwise. Agreement in PMF and APCA results was stronger 
when uncertainty was considered in health models. Accounting for uncertainty in source contributions leads to more stable 
effect estimates across methods and potentially to fewer spurious significant associations. 
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INTRODUCTION 

Particulate air pollution has consistently been associated with a 
variety of adverse health outcomes, including increased mortality 1 
and hospital admissions. 2 Results from a growing number of epide- 
miological studies suggest that some fine particulate matter (PM 2 . 5 ) 
constituents are more toxic than others. 3 Multicity studies of acute 
PM effects, for example, show that the relation between ambient 
PM 2 . 5 and hospital admissions varies by city, 4,5 with this variation 
attributed in part to differences in particle composition. 6,7 

Several factor analysis and source apportionment methods have 
been developed to apportion sources of ambient PM 25 , 8 " 10 the 
estimates of which have subsequently been used in epidemiological 
studies to investigate the association between source-specific PM 2 . 5 
and health. Ozkaynak and Thurston, 11 for example, found increased 
risk of mortality after exposure to particles associated with iron/steel 
emissions and coal combustion. Laden et al. 12 used specific rotation 
factor analysis to show that particles from mobile and coal 
combustion sources are associated with significant increases in 
daily mortality in six US cities. Using positive matrix factorization 
(PMF) and chemical mass balance, Sarnat et al. 13 found significant 
associations between mobile sources and biomass burning and 
cardiovascular disease (CVD)-related hospital admissions in Atlanta, 
GA, USA. Significant effects between mobile sources and 
cardiovascular hospital admissions were also observed by Lall 
et al. 14 in New York, NY, USA, using PMF. Although these studies 



used different source apportionment methods, the results from an 
intercomparison of source apportionment methods suggest that 
different source apportionment methods produce similar findings in 
health effect studies. 15 

None of these studies, however, have accounted for the 
uncertainty in the prediction of the source contributions. By 
failing to account for uncertainty in the source apportionment, 
uncertainty associated with factor estimation, that is source 
contribution estimation, was ignored, potentially leading to an 
underestimation of the uncertainty (i.e., standard errors) asso- 
ciated with the estimated health effects. 16,17 To address this issue, 
we investigate the impact of propagation of the uncertainty 
associated with prediction of source contributions in the 
association between specific PM 25 sources and emergency CVD 
hospital admissions, using two source apportionment methods 
and air pollution and health data from Boston, MA, USA. 



METHODS 

Data Collection 

Hospital admissions data. Daily counts of emergency hospital admissions 
were obtained from billing claims of Medicare enrollees >64 years old for 
Boston, MA, USA (2003-2010), for the counties Middlesex, Essex, Norfolk, 
and Suffolk. Only admissions that occurred through the emergency 
department were included, as scheduled admissions are likely not related 
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to short-term air pollution exposures. Each billing claim contains 
information on the date of hospitalization, age, residence county, and 
primary and secondary diagnoses. Using codes from the International 
Classification of Diseases, 9th Revision (ICD-9; Center for Disease Control 
and Prevention 2008), we considered hospital admissions for all CVD 
conditions (codes 390-429). 

Air pollution and weather data. Ambient measurements of PM 2 . 5 and its 
components were measured at the EPA PM Center stationary ambient 
monitoring supersite, located in downtown Boston on the roof of the 
Countway Library at the Harvard Medical School. The 24-h integrated PM 2 . 5 
samples were collected using a Sequential Sampler (Partisol Model 2300 by 
Rupprecht and Patashnick, Albany, NY, USA) at a flow rate of 16.7 LPM. 
PM 2 . 5 samples were analyzed for mass using gravimetric analysis and for 
elemental concentrations using X-Ray Fluorescence at the Harvard School 
of Public Health. Continuous black carbon (BC) concentrations were 
measured every 5min using an Aethalometer (model AE-14 by Magee 
Scientific, Berkeley, CA, USA). Temperature and dew point were obtained 
from the hourly surface observations of the National Weather Service First 
Order Station at Logan Airport (East Boston). 

Data Analysis 

For our study, we restricted analyses to concentrations measured between 
2003 and 2010. During this period, we observed strong source-specific 
PM 2 . 5 effects, allowing us to assess how propagation of uncertainty affects 
non-null estimates. Data from a longer time period (1998-2010) showed 
weaker associations, with a significant association only for mobile sources. 
For the purposes of this study, strong effects were crucial, as uncertainty 
propagation when the effect estimates are null would not be meaningful. 

All statistical analyses were conducted using the R Statistical Software, 
version 2.14.1 (Foundation for Statistical Computing, Vienna, Austria). 

Source apportionment. Sources of PM 25 in downtown Boston were 
identified using two different methods: US Environmental Protection 
Agency PMF 3.0 (ref. 9) and the absolute principle component analysis 
(APCA). 10 These methods were selected because both have been 
extensively used in fine particulate source apportionment in the past. 14,1 8,19 

Briefly, both methods assume unknown source profiles. APCA is an 
extension of principle components analysis (PCA); once PCA is conducted, 
the principle components are rotated and the component scores rescaled 
relative to a reference of zero pollution. Total PM 2 5 mass concentrations are 
then regressed on these factor scores and thus both component scores and 
loadings are provided. 10 PMF, on the other hand, minimizes a least squares 
function that also takes into account the uncertainty associated with the 
daily measurements of each element. PMF further imposes the restriction 
that both source profiles and contributions are non-negative. 9 

For the APCA method, we used the SAS software, version 9.3 (SAS 
Institute, Cary, NC, USA), whereas for PMF we used the ME-2 multilinear 
engine executable and script files under the EPA public license, available 
with installation of EPA PMF 3.0. 

Base health analyses. As "base analyses" we refer to all analyses 
conducted without considering the impact of uncertainty. 

We fit case-crossover analyses using a time-stratified approach; control 
days were chosen bidirectionally for subjects on the same year, month, 
and day of week of the admission. 20,21 This design eliminates confounding 
by personal characteristics that do not change over time 22 and limits 
confounding by seasonality and long-term trends 23 

We ran conditional logistic regressions for total CVD admissions, for each 
of the two source contribution data sets, generated by PMF and APCA. We 
ran our models including all factors simultaneously to account for potential 
confounding by factors, and separately by factor. We used linear 
adjustments for same-day temperature, same-day dew point, and 2-day 
moving averaged temperature starting from the day before to 3 days 
before admission. We additionally adjusted for PM 25 , given previous 
associations with the health outcomes, differential correlation with the 
factors included in the model, and with other pollutants not included in 
the model that could act as confounders 24 

We examined associations for moving averages of 1, 2, 4, 6, and 7 days. 
Results are presented as % change in CVD admissions per IQR increase of 
each estimated source contribution. 

Uncertainty propagation. To assess the uncertainty associated with the 
estimation of the source contributions, we used a block bootstrap 



procedure, allowing for overlapping blocks, to account for serial correlation 
among observations in the time series. 25 We selected the block size by 
assessing the autocorrelation and deseasonalized autocorrelation 
of the base analysis identified factors. No serial correlation remained 
after 15 days for any of the factors and we, therefore, conservatively 
selected blocks of 20 days. 

We created 1500 bootstrap samples. For each bootstrap sample we 
apportioned PM 25 sources using both the PMF and APCA methods, 
selecting a priori the number of factors selected in the base analysis. We 
then "mapped" the identified factors to the base factors, that is, matched 
each bootstrap factor to the base factor with which it had the strongest 
correlation. We designated a factor as unmapped if its correlation with the 
base factor was <0.60. Subsequently, for both PMF and APCA and each 
bootstrap sample, we ran the health models, including unmapped factors. 
The health effect estimate for each factor, and for each source 
apportionment method, was estimated as the median of the distribution 
of the 1500 effect estimates. Confidence intervals (CIs) were calculated using 
the 2.5% and 97.5% percentiles of that distribution. We assessed the % 
change in the width of the CIs of these estimates compared with the base 
results for both source apportionment methods, directly on the regression 
coefficients and not the % change per IQR increase of each factor. 

In addition, we calculated the across-methods average factor-specific 
effect estimate. For each bootstrap sample we calculated the average health 
effect estimate for each factor identified by the two methods. The across- 
methods average estimate for each factor was then estimated as the median 
of the distribution of the 1500 averages and its CIs were calculated using the 
2.5% and 97.5% percentiles of that distribution. 

A flow diagram of the methods employed is presented in the 
Supplementary Material. 



RESULTS 

Descriptive statistics for PM 2 . 5 and the species included in our 
analyses are presented in the Supplementary Table S1. In Boston, 
a median of 58 daily CVD-related hospitalizations was observed. 

Base Source Apportionment 

Six factors were identified by both PMF and APCA: regional, 
mobile, and crustal sources, residual oil combustion, road dust, 
and sea salt (Supplementary Table S2 and Supplementary Figure S1). 
Overall, the factors identified by the two methods were strongly 
correlated, with across-method by-factor correlation coefficients 
ranging from 0.79 to 0.98, with the exception of road dust 
(r=0.33) (Supplementary Figure S2). Within methods, we observed 
larger correlations across factors in the PMF solution, with 
maximal correlation between road and mobile sources (r=0.45), 
and orthogonal factors in the APCA solution (Supplementary 
Table S3). 

Base Health Analyses 

For both source apportionment methods, stronger associations 
were observed for mobile and regional sources and residual oil 
combustion, whereas inconsistent associations across methods 
were observed for road dust and crustal sources (Figures 1-6 and 
Supplementary Table S4). 

When all factors were included in the health model simulta- 
neously, we observed strong positive associations between 
regional PM 2 . 5 sources and CVD admissions for the shorter 
exposure durations; for instance, same-day exposure to regional 
PM 2 . 5 was associated with a 1.44% (-0.01, 2.90%) for PMF and 
2.35% (0.77, 3.95%) for APCA increase in the rate of admissions. 

We also observed positive effects for mobile sources for same- 
day exposures (0.89% (95% CI: -0.06, 1.86%) for PMF and 0.76% 
(-0.04, 1.58%) for APCA increase in the rate of CVD admissions 
per IQR increase of the factor). Similar increases were observed for 
4-day averaged exposures. For 6- and 7-day averaged exposures, 
we only observed strong positive effects for PMF, whereas for 
APCA the effects were not as strong. 

For residual oil combustion we observed strong positive 
associations for all exposure windows, with effects increasing 
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Figure 1. Percent change in total CVD hospital admissions per IQR increase in regional PM 2 . 5 factor for all exposure windows when all factors 
were simultaneously included in the health model. 



with exposure duration. The strongest associations were observed 
for weekly averaged exposures (PMF: 1.89% (0.52, 3.29%) and 
APCA: 2.15% (0.71, 3.62%), respectively). 

We found strong negative effects for the APCA-identified road 
dust for same-day and 2-day averaged exposures. We saw null 
effects for longer moving averages and for all exposure windows 
when PMF-identified road dust was used. We also found no 
association between crustal and salt PM 2 . 5 and rate of increased 
admissions, with the exception of same-day exposure to the 
APCA-identified crustal. 

We found similar results when we ran models containing PM 2 . 5 
contributions from a single source (Supplementary Figures S4-S8). 

These effect estimates, however, likely have invalid inferences, 
as the uncertainty in the source contributions has not been 
propagated into the health models. 

Uncertainty Propagation 

In all 1500 bootstraps, we mapped the PMF-identified factors to 
the PMF base factors. For APCA, 6-10% of all bootstraps (across all 
exposure windows) yielded unmapped factors. Of these, 96% were 
unmapped to the factor sea salt and <4% to regional sources. The 
distributions of the correlations between the base and the 
bootstrap factors are presented in Supplementary Table S5. 

The impact of accounting for uncertainty in source contribu- 
tions on the effect estimates and their CIs, when all factors were 
simultaneously included in one model, is shown in Figures 1-6 
and Supplementary Table S4. The % change in the CI widths of the 
regression coefficients is presented in Table 1. 

We did not observe any systematic changes in the effect 
estimates compared with the base case when we accounted for the 
uncertainty in the source contributions, suggesting that the original 
estimates did not incur noticeable bias. We observed strong 
associations between mobile and regional sources and increased 
rate of CVD admissions for shorter exposure durations (same-day 
exposure and 2- and 4-day moving averages) and between residual 
oil combustion and rate of CVD-hospitalizations for 4- to 7-day 
moving averages (Figures 1-6 and Supplementary Table S4). 

For 2003-2010, when all factors were included in the health 
model simultaneously, we observed strong associations between 
regional PM 2 .5 sources and CVD admissions for the shorter 
exposure durations. Regional PM Z5 was associated with a 2.01% 



(0.21, 3.69%) increase in the rate of admissions for PMF and 1.96% 
(-0.07, 4.19%) for APCA for same-day exposures and 1.58% 
( — 0.36, 3.67%) increase in the rate of admissions for PMF and 
1.60% ( - 0.60, 4.09%) for APCA for 4-day averaged exposures. 

Mobile sources for same-day and 2-day averaged exposures 
were also associated with an increase in the rate of CVD 
admissions of 1.44% (95% CI: 0.02, 3.11%) for PMF and 0.95% 
(-0.31, 2.20%) for APCA per factor IQR increase for 2-day 
exposures. Similar effects were observed for 4-day averaged 
exposures. Contrary to the base results, no strong effects were 
observed for longer moving averages for either method, when 
uncertainty was propagated. 

For residual oil combustion, we no longer observed strong 
positive associations for all exposure windows. In contrast to the 
base results, with significant APCA results for same-day and 2-day 
exposures, we only observed strong associations for both 
methods for exposures longer than 4 days, with strongest effects 
observed for weekly exposures: 2.20% (0.25, 4.30%) for PMF and 
2.04% (0.08, 4.15%) for APCA. 

For crustal, road dust, and salt PM 25 , we observed mostly null 
effects, with the exception of 2-day exposure to APCA-identified 
road dust (-0.69% (-1.67, 0.09%)). This was in contrast to our 
base findings, showing that same-day and 2-day APCA-identified 
road dust and same-day APCA-identified crustal were strongly 
related to hospital admissions, but not when PMF was used. 

Average effects across methods. When averaging across PMF and 
APCA, the associations between mobile and regional sources and 
increased rate of CVD admissions remained strong for shorter 
exposure durations (same-day exposure and 2- and 4-day moving 
averages), and between residual oil combustion and rate of CVD 
hospitalizations for 4- to 7-day moving averages (Figures 1-6). 
For example, same-day exposures to mobile and regional 
sources were associated with a 1.07% (0.08, 2.28%) and 2.00% 
(0.18, 3.78%) increase in the rate of CVD admissions, whereas 
weekly exposures to residual oil combustion were associated 
with a 2.12% (0.19, 4.22%) increase in the rate of hospitalizations 
(Supplementary Table S7). The results were consistent 
when the factors were included separately in the health 
models (Supplementary Tables S6 and S8 and Supplementary 
Figures S4-S8). 
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Figure 2. Percent change in total CVD hospital admissions per IQR increase in mobile PM 2 . 5 factor for all exposure windows when all factors 
were simultaneously included in the health model. 
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Figure 3. Percent change in total CVD hospital admissions per IQR increase in residual oil PM 2 . 5 factor for all exposure windows when all 
factors were simultaneously included in the health model. 



Confidence intervals. For both source apportionment methods, all 
factors, and exposure windows, we found that the width of the CIs 
increased when uncertainty was propagated, although with no 
apparent relationship to exposure durations. The increase in the CI 
widths was smallest for factors that were identified in a more 
stable manner and were consistent across source apportion- 
ment methods, such as regional and crustal sources. For 
both PMF- and APCA-identified factors, the correlation between 
base and bootstrap factors was close to one for regional and 
crustal sources (0.981-0.997). When comparing with the base 
results, after propagating uncertainty, the CI widths of the 
regression coefficients for regional sources increased by 16.1- 
21.5% for PMF and 18.2-33.0% for APCA across exposure 
windows. Correspondingly, the increase in the CI widths of the 
regression coefficients for crustal sources ranged from 10.6 to 



31.3% for PMF and from 14.8 to 41.3% for APCA across exposure 
windows. 

Conversely, the increase in the CI widths was much larger when 
the correlations between the base and bootstrap factors were not 
as strong. For example, for PMF-identified residual oil, for which 
the 2.5 percentile of the distribution of correlations was 0.790, the 
increase in the CI widths of the regression coefficients ranged 
between 89.3 and 200.8%. Moreover, for APCA-identified salt, for 
which we observed unmapped factors, the increase in the CI 
widths of the regression coefficients ranged between 341.4 and 
803.6% as compared with the base results. 

DISCUSSION 

We investigated the impact of uncertainty in source contributions 
on the association between PM 2 . 5 sources identified using two 
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Figure 4. Percent change in total CVD hospital admissions per IQR increase in crustal PM 2 . 5 factor for all exposure windows, when all factors 
were simultaneously included in the health model. 
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Figure 5. Percent change in total CVD hospital admissions per IQR increase in road dust PM 2 . 5 factor for all exposure windows when all factors 
were simultaneously included in the health model. 



source apportionment methods and CVD emergency hospital 
admissions. For our selected study period, we observed strong 
associations between shorter exposures to mobile and regional 
sources and longer durations of exposure to residual oil combus- 
tion, and increased rate of CVD emergency admissions. These 
associations remained strong after propagating the uncertainty in 
the source contributions. Moreover, strong associations observed 
in the base analysis between same-day exposures to road dust and 
crustal sources and CVD admissions were no longer significant 
after accounting for uncertainty in source contributions. 

As the two source apportionment methods we employed differ 
in their approach, it was not surprising that their source 
contributions and subsequent health estimates would also differ, 
although in most cases not significantly, with similar effect 



estimates and widely overlapping CIs. This is in agreement with 
findings from the workshop on the source apportionment of PM 
health effects 15,1 9,26 that reported that the choice of source 
apportionment method contributes ~15% to the uncertainty of 
the health effect estimates. 26 

Furthermore, we demonstrated that failure to account for 
uncertainty in the source contributions, regardless of the choice of 
method, leads to invalid inferences, that is, more narrow 
CIs of the health effect estimates. The increase in the CI widths 
depends mainly on how consistently a factor is identified, and also 
on the statistical power one has to observe any effects that factor 
might have. For example, in our study, regional sources were very 
consistently identified across methods and bootstraps and we 
therefore saw the smallest increase in the CI widths. Conversely, 
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Figure 6. Percent change in total CVD hospital admissions per IQR increase in salt PM 2 . 5 factor for all exposure windows when all factors were 
simultaneously included in the health model. 

previously developed, 27,28 to the best of our knowledge, this is 
the first effort to combine across-methods source-specific health 
effect estimates. 

An across-methods source-specific effects ensemble reflects the 
weight of the evidence as well as addresses potentially conflicting 
signals at the same time. For example, for mobile sources, both 
source apportionment methods yielded positive, more similar 
effect estimates for shorter exposure durations, whereas for longer 
moving averages the base effects were conflicting, with signifi- 
cantly positive PMF, but not APCA, effects. Accounting for 
uncertainty, the pattern became more clear, with better agreement 
between the effects estimated by the two source apportionment 
methods. The same pattern was also clear in the across-methods 
average effect, with consistency across the two methods yielding 
reduced variability compared with the individual effect estimates. 

In terms of potential bias when failing to account for 
uncertainty, in a simulation study, Nikolov et al. 16 observed 
biased health effect estimates when they did not account for 
uncertainty in the source contributions. However, their study 
design, study population, and statistical analyses were very 
different than ours, as they investigated cardiovascular effects in 
a much smaller sample of dogs and the bias occurred in a linear 
health model. In this study, we did not observe any significant bias 
in the health effect estimates of our base analyses. Even though 
the effect estimates were in cases somewhat different in the 
bootstrap results, any oscillations occurred well within the CIs and 
thus do not appear as systematic bias. 

Our study is limited by the use of measurements conducted at a 
single monitor; some PM 2 . 5 components and sources, moreover, 
are more spatially variable than others. A study also addressing 
the impact of such errors, while accounting for uncertainty 
propagation, would provide a more comprehensive picture, as we 
were not able to address exposure error with our analyses. 
Furthermore, the effects of PM 2 .5 sources on health have been 
shown to be heterogeneous in time and space, and thus more 
studies should be conducted to examine factors contributing to 
this heterogeneity. Finally, with our analyses, our goal was to 
address the impact of uncertainty specifically associated with the 
estimation of source contributions. We, therefore, did not examine 
the impact of multiple sources of uncertainty, such as measure- 
ment error, spatial misalignment, and error associated with XRF 
analysis. 



Table 1. Percent change in the CI width of the regression coefficients 
for each factor, as compared with the base case, when all factors were 
simultaneously included in the health model. 
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although salt was consistently identified with PMF, the lack of 
daily variability of the factor, and consequently decreased 
statistical power, led to large increases in the CI widths. Moreover, 
accounting for uncertainty contributed to fewer spurious 
significant results and more consistent findings across methods, 
as was the case with our road dust and crustal findings. 

To account for the differences in the effect estimates obtained 
by the two different source apportionment methods, we also 
estimated average across-methods source-specific effects using 
the bootstrap effect distributions. Estimation of effects across 
methods would otherwise not be possible, as the derivation of the 
analytical form of the variance of an average effect, which would 
also incorporate the uncertainty in the estimation of the source 
contributions by both methods, is not straightforward. Use of 
bootstrap effect estimate distributions allows us to consider 
differences in the source apportionment methods while 
accounting for uncertainty in their estimation, akin to an 
ensemble of their resulting health effect estimates. Although 
ensemble-trained source apportionment methods have been 
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To our knowledge, this has been the first study to quantify the 
impact of uncertainty in the source contributions on health effect 
estimates, employing two widely used source apportionment 
methods. We demonstrated that uncertainty propagation results 
in increased CI widths. Finally, our findings qualitatively extend to 
other cases in which exposure estimates are used as "true" 
exposures in health analyses. 
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APCA, absolute principle component analysis; CVD, cardiovascular 
disease; PM 2 .5, fine particulate matter; PMF, positive matrix 
factorization 
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